RAG Evaluation Toolkit
GENERATOR
66.0%
The Generator is the LLM inside the RAG to generate the answers.
RETRIEVER
65.0%
The Retriever fetches relevant documents from the knowledge base according to a user query.
REWRITER
62.5%
The Rewriter modifies the user query to match a predefined format or to include the context from the chat history.
ROUTING
100.0%
The Router filters the query of the user based on his intentions (intentions detection).
KNOWLEDGE_BASE
0.0%
The knowledge base is the set of documents given to the RAG to generate the answers. Its scores is computed differently from the other components: it is the difference between the maximum and minimum correctness score across all the topics of the knowledge base.
Overall Correctness Score
66%
RECOMMENDATION
The RAG system needs significant improvements in handling "distracting" and "situational" questions since these received the lowest scores. Additionally, data enrichment or enhancing the retriever component for topics with a score of zero, like "Algae and Cyanobacteria Research", "Biomass and Biochar Studies", and several others, should be prioritized to improve the overall performance.
CORRECTNESS BY TOPIC
KNOWLEDGE BASE OVERVIEW
SELECTED METRICS
Selected metrics